AITopics | memory constraint

Collaborating Authors

memory constraint

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

TinyTTA: Efficient Test-time Adaptation via Early-exit Ensembles on Edge Devices

Neural Information Processing SystemsMar-20-2026, 10:15:22 GMT

The increased adoption of Internet of Things (IoT) devices has led to the generation of large data streams with applications in healthcare, sustainability, and robotics. In some cases, deep neural networks have been deployed directly on these resource-constrained units to limit communication overhead, increase efficiency and privacy, and enable real-time applications. However, a common challenge in this setting is the continuous adaptation of models necessary to accommodate changing environments, i.e., data distribution shifts. Test-time adaptation (TTA) has emerged as one potential solution, but its validity has yet to be explored in resource-constrained hardware settings, such as those involving microcontroller units (MCUs). TTA on constrained devices generally suffers from i) memory overhead due to the full backpropagation of a large pre-trained network, ii) lack of support for normalization layers on MCUs, and iii) either memory exhaustion with large batch sizes required for updating or poor performance with small batch sizes. In this paper, we propose TinyTTA, to enable, for the first time, efficient TTA on constrained devices with limited memory. To address the limited memory constraints, we introduce a novel self-ensemble and batch-agnostic early-exit strategy for TTA, which enables continuous adaptation with small batch sizes for reduced memory usage, handles distribution shifts, and improves latency efficiency. Moreover, we develop the TinyTTA Engine, a first-of-its-kind MCU library that enables on-device TTA.

artificial intelligence, machine learning, proceedings, (11 more...)

Neural Information Processing Systems

Genre: Research Report (0.38)

Industry: Information Technology (0.38)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Aggregating Capacity in FL through Successive Layer Training for Computationally-Constrained Devices

Neural Information Processing SystemsFeb-13-2026, 23:56:54 GMT

If the required memory to train a model exceeds this limit, the device will be excluded from the training.

artificial intelligence, fedrolex, machine learning, (19 more...)

Neural Information Processing Systems

Country:

Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)
North America > United States > Virginia (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Europe > United Kingdom > England > Bristol (0.04)

Industry: Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

SpArSe: Sparse Architecture Search for CNNs on Resource-Constrained Microcontrollers

Igor Fedorov, Ryan P. Adams, Matthew Mattina, Paul Whatmough

Neural Information Processing SystemsFeb-11-2026, 08:46:38 GMT

Neural Information Processing Systems http://nips.cc/

cnn, mcus, sparse, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > Canada (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)
(2 more...)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(4 more...)

Add feedback

Efficient Combination of Rematerialization and Offloading for Training DNNs

Neural Information Processing SystemsFeb-11-2026, 03:26:24 GMT

Rematerialization and offloading are two well known strategies to save memory during the training phase of deep neural networks, allowing data scientists to consider larger models, batch sizes or higher resolution data.

artificial intelligence, deep learning, machine learning, (18 more...)

Neural Information Processing Systems

Genre: Research Report (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)

Add feedback

Hypothesis Selection with Memory Constraints

Neural Information Processing SystemsDec-26-2025, 10:50:56 GMT

Hypothesis selection is a fundamental problem in learning theory and statistics. Given a dataset and a finite set of candidate distributions, the goal is to select a distribution that matches the data as well as possible. More specifically, suppose we have sample access to an unknown distribution $P$ over a domain $\mathcal{X}$ that we know is well-approximated by one of a a class of $n$ distributions (a.k.a.

hypothesis selection, memory constraint, name change, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.39)

Add feedback

Constrained deep neural network architecture search for IoT devices accounting for hardware calibration

Neural Information Processing SystemsDec-26-2025, 03:49:00 GMT

Deep neural networks achieve outstanding results for challenging image classification tasks. However, the design of network topologies is a complex task, and the research community is conducting ongoing efforts to discover top-accuracy topologies, either manually or by employing expensive architecture searches. We propose a unique narrow-space architecture search that focuses on delivering low-cost and rapidly executing networks that respect strict memory and time requirements typical of Internet-of-Things (IoT) near-sensor computing platforms. Our approach provides solutions with classification latencies below 10~ms running on a low-cost device with 1~GB RAM and a peak performance of 5.6~GFLOPS. The narrow-space search of floating-point models improves the accuracy on CIFAR10 of an established IoT model from 70.64% to 74.87% within the same memory constraints. We further improve the accuracy to 82.07% by including 16-bit half types and obtain the highest accuracy of 83.45% by extending the search with model-optimized IEEE 754 reduced types. To the best of our knowledge, this is the first empirical demonstration of more than 3000 trained models that run with reduced precision and push the Pareto optimal front by a wide margin. Within a given memory constraint, accuracy is improved by more than 7% points for half and more than 1% points for the best individual model format.

architecture search, deep neural network architecture search, iot device accounting, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.82)

Add feedback

DiskChunGS: Large-Scale 3D Gaussian SLAM Through Chunk-Based Memory Management

Feldmann, Casimir, Wilder-Smith, Maximum, Patil, Vaishakh, Oechsle, Michael, Niemeyer, Michael, Tateno, Keisuke, Hutter, Marco

arXiv.org Artificial IntelligenceDec-1-2025

Abstract--Recent advances in 3D Gaussian Splatting (3DGS) have demonstrated impressive results for novel view synthesis with real-time rendering capabilities. However, integrating 3DGS with SLAM systems faces a fundamental scalability limitation: methods are constrained by GPU memory capacity, restricting reconstruction to small-scale environments. We present DiskChunGS, a scalable 3DGS SLAM system that overcomes this bottleneck through an out-of-core approach that partitions scenes into spatial chunks and maintains only active regions in GPU memory while storing inactive areas on disk. Our architecture integrates seamlessly with existing SLAM frameworks for pose estimation and loop closure, enabling globally consistent reconstruction at scale. Our method uniquely completes all 11 KITTI sequences without memory failures while achieving superior visual quality, demonstrating that algorithmic innovation can overcome the memory constraints that have limited previous 3DGS SLAM methods. ECENT advances in neural representations for 3D scene reconstruction have revolutionized novel view synthesis, with 3D Gaussian Splatting (3DGS) [1] emerging as an exceptionally efficient and high-quality approach. Unlike volume-based methods [2]-[4] that struggle with rendering speed due to expensive ray marching, 3DGS provides real-time rendering capabilities while maintaining impressive visual fidelity.

artificial intelligence, gaussian, keyframe, (16 more...)

arXiv.org Artificial Intelligence

2511.2303

Country: Europe > Switzerland (0.28)

Genre: Research Report (0.65)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)

Add feedback

Tree Training: Accelerating Agentic LLMs Training via Shared Prefix Reuse

Wang, Shaojie, Wang, Jinghui, Cui, Yinghan, Chen, Xuxing, Wang, Chao, Huang, Liang, Zhang, Xiaojiang, Peng, Junyi, Wan, Li, Zhang, Haotian, Chen, Bin

arXiv.org Artificial IntelligenceNov-25-2025

In agentic LLM scenarios, an agent's interaction process during a single rollout often exhibits branching behaviors. Due to memory retrieval and concurrent tool executions at certain decision points, the token trajectory of one task evolves into a tree-like structure rather than a linear sequence. However, current training pipelines decompose such tree-structured trajectories into separate linear segments, treating each branch as an independent sequence. As a result, shared prefixes across these branches are repeatedly recomputed during both forward and backward passes. To address this inefficiency, we propose Tree Training, a paradigm that computes each shared prefix only once and reuses its intermediate results across related branches during both forward and backward passes, substantially improving computation efficiency in large-scale agentic training. This is achieved via (i) Tree Packing, which efficiently reuses shared computations across trajectories, and (ii) Gradient Restoration, which ensures correct gradient propagation across reused prefixes. Experiments on multiple open-source models demonstrate up to 3.9x reduction in total training time, enabling more efficient agentic LLM SFT and RL training.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2511.00413

Genre: Research Report (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

Memory Constrained Dynamic Subnetwork Update for Transfer Learning

Quélennec, Aël, Mozharovskyi, Pavlo, Nguyen, Van-Tam, Tartaglione, Enzo

arXiv.org Artificial IntelligenceOct-27-2025

On-device neural network training faces critical memory constraints that limit the adaptation of pre-trained models to downstream tasks. We present MeDyate, a theoretically-grounded framework for memory-constrained dynamic subnetwork adaptation. Our approach introduces two key innovations: LaRa (Layer Ranking), an improved layer importance metric that enables principled layer pre-selection, and a dynamic channel sampling strategy that exploits the temporal stability of channel importance distributions during fine-tuning. MeDyate dynamically resamples channels between epochs according to importance-weighted probabilities, ensuring comprehensive parameter space exploration while respecting strict memory budgets. Extensive evaluation across a large panel of tasks and architectures demonstrates that MeDyate achieves state-of-the-art performance under extreme memory constraints, consistently outperforming existing static and dynamic approaches while maintaining high computational efficiency. Our method represents a significant step towards enabling efficient on-device learning by demonstrating effective fine-tuning with memory budgets as low as a few hundred kB of RAM.

artificial intelligence, machine learning, selection, (18 more...)

arXiv.org Artificial Intelligence

2510.20979

Country: Europe > Switzerland (0.28)

Genre: Research Report > New Finding (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Filters

Collaborating Authors

memory constraint

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

TinyTTA: Efficient Test-time Adaptation via Early-exit Ensembles on Edge Devices

Aggregating Capacity in FL through Successive Layer Training for Computationally-Constrained Devices

SpArSe: Sparse Architecture Search for CNNs on Resource-Constrained Microcontrollers

Efficient Combination of Rematerialization and Offloading for Training DNNs

90c56c77c6df45fc8e556a096b7a2b2e-Supplemental-Conference.pdf

Hypothesis Selection with Memory Constraints

Constrained deep neural network architecture search for IoT devices accounting for hardware calibration

DiskChunGS: Large-Scale 3D Gaussian SLAM Through Chunk-Based Memory Management

Tree Training: Accelerating Agentic LLMs Training via Shared Prefix Reuse

Memory Constrained Dynamic Subnetwork Update for Transfer Learning